Goto

Collaborating Authors

 premature convergence



Efficient and Diverse Generative Robot Designs using Evolution and Intrinsic Motivation

Goff, Leni K. Le, Smith, Simón C.

arXiv.org Artificial Intelligence

Methods for generative design of robot physical configurations can automatically find optimal and innovative solutions for challenging tasks in complex environments. The vast search-space includes the physical design-space and the controller parameter-space, making it a challenging problem in machine learning and optimisation in general. Evolutionary algorithms (EAs) have shown promising results in generating robot designs via gradient-free optimisation. Morpho-evolution with learning (MEL) uses EAs to concurrently generate robot designs and learn the optimal parameters of the controllers. Two main issues prevent MEL from scaling to higher complexity tasks: computational cost and premature convergence to sub-optimal designs. To address these issues, we propose combining morpho-evolution with intrinsic motivations. Intrinsically motivated behaviour arises from embodiment and simple learning rules without external guidance. We use a homeokinetic controller that generates exploratory behaviour in a few seconds with reduced knowledge of the robot's design. Homeokinesis replaces costly learning phases, reducing computational time and favouring diversity, preventing premature convergence. We compare our approach with current MEL methods in several downstream tasks. The generated designs score higher in all the tasks, are more diverse, and are quickly generated compared to morpho-evolution with static parameters.


Model-Based Relative Entropy Stochastic Search Abbas Abdolmaleki, Jan Peters

Neural Information Processing Systems

Stochastic search algorithms are general black-box optimizers. Due to their ease of use and their generality, they have recently also gained a lot of attention in operations research, machine learning and policy search. Yet, these algorithms require a lot of evaluations of the objective, scale poorly with the problem dimension, are affected by highly noisy objective functions and may converge prematurely. To alleviate these problems, we introduce a new surrogate-based stochastic search approach. We learn simple, quadratic surrogate models of the objective function.


Investigating Premature Convergence in Co-optimization of Morphology and Control in Evolved Virtual Soft Robots

Mertan, Alican, Cheney, Nick

arXiv.org Artificial Intelligence

Evolving virtual creatures is a field with a rich history and recently it has been getting more attention, especially in the soft robotics domain. The compliance of soft materials endows soft robots with complex behavior, but it also makes their design process unintuitive and in need of automated design. Despite the great interest, evolved virtual soft robots lack the complexity, and co-optimization of morphology and control remains a challenging problem. Prior work identifies and investigates a major issue with the co-optimization process -- fragile co-adaptation of brain and body resulting in premature convergence of morphology. In this work, we expand the investigation of this phenomenon by comparing learnable controllers with proprioceptive observations and fixed controllers without any observations, whereas in the latter case, we only have the optimization of the morphology. Our experiments in two morphology spaces and two environments that vary in complexity show, concrete examples of the existence of high-performing regions in the morphology space that are not able to be discovered during the co-optimization of the morphology and control, yet exist and are easily findable when optimizing morphologies alone. Thus this work clearly demonstrates and characterizes the challenges of optimizing morphology during co-optimization. Based on these results, we propose a new body-centric framework to think about the co-optimization problem which helps us understand the issue from a search perspective. We hope the insights we share with this work attract more attention to the problem and help us to enable efficient brain-body co-optimization.


A Gentle Introduction to Premature Convergence

#artificialintelligence

Population-based optimization algorithms, like evolutionary algorithms and swarm intelligence, often describe their dynamics in terms of the interplay between selective pressures and convergence. For example, strong selective pressures result in faster convergence and likely premature convergence. Weaker selective pressures may result in a slower convergence (greater computational cost) although perhaps locate a better or even global optima. An operator with a high selective pressure decreases diversity in the population more rapidly than operators with a low selective pressure, which may lead to premature convergence to suboptimal solutions. A high selective pressure limits the exploration abilities of the population.


Evaluation-Time Bias in Evolutionary Algorithms

#artificialintelligence

An Evolutionary Algorithm (EA) is a subset of evolutionary computation in artificial intelligence. Evolutionary computation is a family of algorithms for global optimization inspired by biological evolution. The rapid development of the information age with Big Data has led to an increase in the size and complexity of the optimization problems. In the context of an EA, this eventually results in the expansion of the search space with the fitness evaluation (used for optimal solution search) computation cost of the individuals becoming extremely high [1]. In this article, we will mainly focus on the Parallel EA variant with its two types and then dive deep into understanding the problem of evaluation time-bias.


Fuzzy Mutation Embedded Hybrids of Gravitational Search and Particle Swarm Optimization Methods for Engineering Design Problems

Kar, Devroop, Ghosh, Manosij, Guha, Ritam, Sarkar, Ram, García-Hernández, Laura, Abraham, Ajith

arXiv.org Artificial Intelligence

Gravitational Search Algorithm (GSA) and Particle Swarm Optimization (PSO) are nature-inspired, swarm-based optimization algorithms respectively. Though they have been widely used for single-objective optimization since their inception, they suffer from premature convergence. Even though the hybrids of GSA and PSO perform much better, the problem remains. Hence, to solve this issue we have proposed a fuzzy mutation model for two hybrid versions of PSO and GSA - Gravitational Particle Swarm (GPS) and PSOGSA. The developed algorithms are called Mutation based GPS (MGPS) and Mutation based PSOGSA (MPSOGSA). The mutation operator is based on a fuzzy model where the probability of mutation has been calculated based on the closeness of particle to population centroid and improvement in the particle value. We have evaluated these two new algorithms on 23 benchmark functions of three categories (unimodal, multi-modal and multi-modal with fixed dimension). The experimental outcome shows that our proposed model outperforms their corresponding ancestors, MGPS outperforms GPS 13 out of 23 times (56.52%) and MPSOGSA outperforms PSOGSA 17 times out of 23 (73.91 %). We have also compared our results against those of recent optimization algorithms such as Sine Cosine Algorithm (SCA), Opposition-Based SCA, and Volleyball Premier League Algorithm (VPL). In addition, we have applied our proposed algorithms on some classic engineering design problems and the outcomes are satisfactory. The related codes of the proposed algorithms can be found in this link: Fuzzy-Mutation-Embedded-Hybrids-of-GSA-and-PSO.


Relative Entropy Regularized Policy Iteration

Abdolmaleki, Abbas, Springenberg, Jost Tobias, Degrave, Jonas, Bohez, Steven, Tassa, Yuval, Belov, Dan, Heess, Nicolas, Riedmiller, Martin

arXiv.org Machine Learning

We present an off-policy actor-critic algorithm for Reinforcement Learning (RL) that combines ideas from gradient-free optimization via stochastic search with learned action-value function. The result is a simple procedure consisting of three steps: i) policy evaluation by estimating a parametric action-value function; ii) policy improvement via the estimation of a local non-parametric policy; and iii) generalization by fitting a parametric policy. Each step can be implemented in different ways, giving rise to several algorithm variants. Our algorithm draws on connections to existing literature on black-box optimization and 'RL as an inference' and it can be seen either as an extension of the Maximum a Posteriori Policy Optimisation algorithm (MPO) [Abdolmaleki et al., 2018a], or as an extension of Trust Region Covariance Matrix Adaptation Evolutionary Strategy (CMA-ES) [Abdolmaleki et al., 2017b; Hansen et al., 1997] to a policy iteration scheme. Our comparison on 31 continuous control tasks from parkour suite [Heess et al., 2017], DeepMind control suite [Tassa et al., 2018] and OpenAI Gym [Brockman et al., 2016] with diverse properties, limited amount of compute and a single set of hyperparameters, demonstrate the effectiveness of our method and the state of art results. Videos, summarizing results, can be found at goo.gl/HtvJKR .


Stochastic Search In Changing Situations

Abdolmaleki, Abbas (University of Aveiro) | Simoes, David (University of Aveiro) | Lau, Nuno (University of Aveiro) | Reis, Luis Paulo (University of Minho) | Price, Bob (PARC) | Neumann, Gerhard (Technische Universität Darmstadt)

AAAI Conferences

Stochastic search algorithms are black-box optimizer of an objective function. They have recently gained a lot of attention in operations research, machine learning and policy search of robot motor skills due to their ease of use and their generality. However, when the task or objective function slightly changes, many stochastic search algorithms require complete re-learning in order to adapt thesolution to the new objective function or the new context. As such, we consider the contextual stochastic search paradigm. Here, we want to find good parameter vectors for multiple related tasks, where each task is described by a continuous context vector. Hence, the objective function might change slightly for each parameter vector evaluation. In this paper, we investigate a contextual stochastic search algorithm known as Contextual Relative Entropy Policy Search (CREPS), an information-theoretic algorithm that can learn from multiple tasks simultaneously. We show the application of CREPS for simulated robotic tasks.


Model-Based Relative Entropy Stochastic Search

Abdolmaleki, Abbas, Lioutikov, Rudolf, Peters, Jan R., Lau, Nuno, Reis, Luis Pualo, Neumann, Gerhard

Neural Information Processing Systems

Stochastic search algorithms are general black-box optimizers. Due to their ease of use and their generality, they have recently also gained a lot of attention in operations research, machine learning and policy search. Yet, these algorithms require a lot of evaluations of the objective, scale poorly with the problem dimension, are affected by highly noisy objective functions and may converge prematurely. To alleviate these problems, we introduce a new surrogate-based stochastic search approach. We learn simple, quadratic surrogate models of the objective function. As the quality of such a quadratic approximation is limited, we do not greedily exploit the learned models. The algorithm can be misled by an inaccurate optimum introduced by the surrogate. Instead, we use information theoretic constraints to bound the `distance' between the new and old data distribution while maximizing the objective function. Additionally the new method is able to sustain the exploration of the search distribution to avoid premature convergence. We compare our method with state of art black-box optimization methods on standard uni-modal and multi-modal optimization functions, on simulated planar robot tasks and a complex robot ball throwing task.The proposed method considerably outperforms the existing approaches.